# AGTR Miscellaneous Evaluation Scripts

These are miscellaneous scripts used during the development and evaluation of
the peHash AGTR.

### Installing Dependencies

sudo -H python3 -m pip install pymongo
sudo -H python3 -m pip install pefile
sudo apt-get install python3-tlsh
git clone https://github.com/knowmalware/pehash.git
sudo -H python3 -m pip install -e pehash

### meta.py

Stores metadata about a malware dataset in a PyMongo database.
We tested multiple metadata hashes and similrity hashes for AGTR suitability,
including peHash, Imphash, TLSH, and Rich header hash. Of these we found that
only the peHash was suitable due to its very low false positive rate.

```
usage: meta.py [-h] [--malware-dir MALWARE_DIR] [--db-name DB_NAME]

optional arguments:
  -h, --help            show this help message and exit
  --malware-dir MALWARE_DIR
  --db-name DB_NAME
```

### label.py

Labels malware using AVClass run on different settings and stores results in
a PyMongo database. Used for collecting labels for AVClass on default
settings as well as for alias resolution, plurality threshold, and heuristic
removal bounds.

```
usage: label.py [-h] [--label-dir LABEL_DIR] [--db-name DB_NAME]
                [--drop-database] [--collection-name COLLECTION_NAME]
                {avclass,dir} ...

optional arguments:
  -h, --help            show this help message and exit
  --label-dir LABEL_DIR
  --db-name DB_NAME
  --drop-database
  --collection-name COLLECTION_NAME

modes:
  {avclass,dir}
```

# eval_pehash.py

You must run meta.py and label.py prior to this script. Reads metadata and
labels from the PyMongo database and constructs an AGTR using them. Prints
the precision / recall / accuracy bounds given the predicted labels and AGTR.
Also used for performing the precision recall shuffle experiment.


### avclass_common3.py and avclass_common3_mod.py

Ports of avclass_common.py to Python3. avclass_common3_mod.py supports custom
functionality for heuristic signature removal.